Semi-supervised object detection is important for 3D scene understanding because obtaining large-scale 3D bounding box annotations on point clouds is time-consuming and labor-intensive. Existing semi-supervised methods usually employ teacher-student knowledge distillation together with an augmentation strategy to leverage unlabeled point clouds. However, these methods adopt global augmentation with scene-level transformations and hence are sub-optimal for instance-level object detection. In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection. In this way, the resultant augmentor is derived to emphasize object instances rather than irrelevant backgrounds, making the augmented data more useful for object detector training. Extensive experiments on the ScanNet and SUN RGB-D datasets show that the proposed OPA performs favorably against the state-of-the-art methods under various experimental settings. The source code will be available at https://github.com/nomiaro/OPA.
translated by 谷歌翻译
对于单眼深度估计,获取真实数据的地面真相并不容易,因此通常使用监督的合成数据采用域适应方法。但是,由于缺乏实际数据的监督,这仍然可能会导致较大的域间隙。在本文中,我们通过从真实数据中生成可靠的伪基础真理来开发一个域适应框架,以提供直接的监督。具体而言,我们提出了两种用于伪标记的机制:1)通过测量图像具有相同内容但不同样式的深度预测的一致性,通过测量深度预测的一致性; 2)通过点云完成网络的3D感知伪标记,该网络学会完成3D空间中的深度值,从而在场景中提供更多的结构信息,以完善并生成更可靠的伪标签。在实验中,我们表明我们的伪标记方法改善了各种环境中的深度估计,包括在训练过程中使用立体声对。此外,该提出的方法对现实世界数据集中的几种最新无监督域的适应方法表现出色。
translated by 谷歌翻译
由于球形摄像机的兴起,单眼360深度估计成为许多应用(例如自主系统)的重要技术。因此,提出了针对单眼360深度估计的最新框架,例如Bifuse中的双预测融合。为了训练这样的框架,需要大量全景以及激光传感器捕获的相应深度地面真相,这极大地增加了数据收集成本。此外,由于这样的数据收集过程是耗时的,因此将这些方法扩展到不同场景的可扩展性成为一个挑战。为此,从360个视频中进行单眼深度估计网络的自我培训是减轻此问题的一种方法。但是,没有现有的框架将双投射融合融合到自我训练方案中,这极大地限制了自我监督的性能,因为Bi-Prodoction Fusion可以利用来自不同投影类型的信息。在本文中,我们建议Bifuse ++探索双投影融合和自我训练场景的组合。具体来说,我们提出了一个新的融合模块和对比度感知的光度损失,以提高Bifuse的性能并提高对现实世界视频的自我训练的稳定性。我们在基准数据集上进行了监督和自我监督的实验,并实现最先进的性能。
translated by 谷歌翻译
近期云的自我监督学习最近取得了很大的关注,因为它在点云任务上解决了标签效率和域间隙问题。在本文中,我们提出了一种新颖的自我监督框架,用于学习部分点云的信息陈述。我们利用包含内容和姿势属性的LIDAR扫描的部分点云,我们表明解开部分点云等两个因素增强了特征表示学习。为此,我们的框架由三个主要部分组成:1)完成网络以捕获点云的整体语义; 2)一个姿势回归网络,了解从扫描部分数据的视角; 3)局部重建网络,以鼓励模型学习内容和构成功能。为了展示学习特征表示的稳健性,我们开展了几个下游任务,包括分类,部分分割和登记,并进行了最先进的方法的比较。我们的方法不仅优于现有的自我监督方法,而且还展示了合成和现实世界数据集的更好普遍性。
translated by 谷歌翻译
我们介绍360-DFPE,一个顺序楼层平面图估计方法,直接将360图像视为输入,而不依赖于有源传感器或3D信息。我们的方法利用单眼视觉SLAM解决方案和单眼360室布局方法之间的松散耦合集成,分别估计相机姿势和布局几何形状。由于我们的任务是使用单眼图像,整个场景结构,房间实例和房间形状顺序捕获平面图。为了解决这些挑战,我们首先通过制定熵最小化过程来处理视觉内径和布局几何形状之间的比例差异,这使我们能够直接对准360布局而不提前了解整个场景。其次,为了顺序识别各个房间,我们提出了一种新颖的室内识别算法,其使用几何信息沿着相机探索跟踪每个房间。最后,为了估算房间的最终形状,我们提出了一种最短的路径算法,具有迭代的粗细策略,这改善了具有更高精度和更快的运行时间的现有制剂。此外,我们收集一个具有具有挑战性的大型场景的新楼层规划数据集,提供了点云和顺序360图像信息。实验结果表明,我们的单眼解决方案实现了依赖于活动传感器的当前最先进的算法的良好性能,并提前要求整个场景重建数据。我们的代码和数据集将很快发布。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. For continuous observations, when this algorithm becomes impractical, we propose a latent variable model specific to the data generation process at hand. We show how the approach degrades as the size of the shift changes, and verify that it outperforms both covariate and label shift adjustment.
translated by 谷歌翻译
Importance: Social determinants of health (SDOH) are known to be associated with increased risk of suicidal behaviors, but few studies utilized SDOH from unstructured electronic health record (EHR) notes. Objective: To investigate associations between suicide and recent SDOH, identified using structured and unstructured data. Design: Nested case-control study. Setting: EHR data from the US Veterans Health Administration (VHA). Participants: 6,122,785 Veterans who received care in the US VHA between October 1, 2010, and September 30, 2015. Exposures: Occurrence of SDOH over a maximum span of two years compared with no occurrence of SDOH. Main Outcomes and Measures: Cases of suicide deaths were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. We developed an NLP system to extract SDOH from unstructured notes. Structured data, NLP on unstructured data, and combining them yielded seven, eight and nine SDOH respectively. Adjusted odds ratios (aORs) and 95% confidence intervals (CIs) were estimated using conditional logistic regression. Results: In our cohort, 8,821 Veterans committed suicide during 23,725,382 person-years of follow-up (incidence rate 37.18 /100,000 person-years). Our cohort was mostly male (92.23%) and white (76.99%). Across the six common SDOH as covariates, NLP-extracted SDOH, on average, covered 84.38% of all SDOH occurrences. All SDOH, measured by structured data and NLP, were significantly associated with increased risk of suicide. The SDOH with the largest effects was legal problems (aOR=2.67, 95% CI=2.46-2.89), followed by violence (aOR=2.26, 95% CI=2.11-2.43). NLP-extracted and structured SDOH were also associated with suicide. Conclusions and Relevance: NLP-extracted SDOH were always significantly associated with increased risk of suicide among Veterans, suggesting the potential of NLP in public health studies.
translated by 谷歌翻译
A household robot should be able to navigate to target locations without requiring users to first annotate everything in their home. Current approaches to this object navigation challenge do not test on real robots and rely on expensive semantically labeled 3D meshes. In this work, our aim is an agent that builds self-supervised models of the world via exploration, the same as a child might. We propose an end-to-end self-supervised embodied agent that leverages exploration to train a semantic segmentation model of 3D objects, and uses those representations to learn an object navigation policy purely from self-labeled 3D meshes. The key insight is that embodied agents can leverage location consistency as a supervision signal - collecting images from different views/angles and applying contrastive learning to fine-tune a semantic segmentation model. In our experiments, we observe that our framework performs better than other self-supervised baselines and competitively with supervised baselines, in both simulation and when deployed in real houses.
translated by 谷歌翻译
Objective: Evictions are involved in a cascade of negative events that can lead to unemployment, homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction incidences and their attributes from electronic health record (EHR) notes. Materials and Methods: We annotated eviction status in 5000 EHR notes from the Veterans Health Administration. We developed a novel model, called Knowledge Injection based on Ripple Effects of Social and Behavioral Determinants of Health (KIRESH), that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and Bio_ClinicalBERT. Moreover, we designed a prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt achieved a Macro-F1 of 0.6273 (presence) and 0.7115 (period), which was significantly higher than 0.5382 (presence) and 0.67167 (period) for just fine-tuning Bio_ClinicalBERT model. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. In future work, we will evaluate the generalizability of the model framework to other applications.
translated by 谷歌翻译